Product Code Database
Example Keywords: undershirt -the $14-185
barcode-scavenger
   » » Wiki: Data Extraction
Tag Wiki 'Data Extraction'.
Tag

Data extraction
 (

 C O N T E N T S 

Data extraction is the act or process of retrieving out of (usually unstructured or poorly structured) data sources for further or data storage (). The into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of prior to to another stage in the data .

Usually, the term data extraction is applied when () data is first imported into a computer from primary sources, like or . Today's electronic devices will usually present an electrical connector (e.g. ) through which '' can be into a personal computer.


Data sources
Typical unstructured data sources include , , documents, , social media, scanned text, mainframe reports, spool files, multimedia files, etc. Extracting data from these unstructured sources has grown into a considerable technical challenge, where as historically data extraction has had to deal with changes in physical hardware formats, the majority of current data extraction deals with extracting data from these unstructured data sources, and from different software formats. This growing process of data extraction from the web is referred to as "Web data extraction" or "".


Imposing structure
The act of adding structure to unstructured data takes a number of forms
  • Using text such as regular expressions to identify small or large-scale structure e.g. records in a report and their associated data from headers and footers;
  • Using a table-based approach to identify common sections within a limited domain e.g. in emailed resumes, identifying skills, previous work experience, qualifications etc. using a standard set of commonly used headings (these would differ from language to language), e.g. Education might be found under Education/Qualification/Courses;
  • Using to attempt to understand the text and link it to other information


See also
  • , discovery of patterns in large data sets using statistics, database knowledge or machine learning
  • , obtaining data from a database management system, often using a query with a set of criteria
  • Extract, transform, load (ETL), procedure for copying data from one or more sources, transforming the data at the source system, and copying into a destination system
  • Information extraction, automated extraction of structured information from unstructured or semi-structured machine-readable data, for example using natural language processing to extract content from images, audio or documents

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs